We are all excited by the progress made by many authors to make their papers reproducible by publishing associated code and data.
We know how challenging it can be so we want to showcase the value of the practice, both for original authors and as a learning experience for those who attempt to reproduce the work.
During a ReproHack, participants attempt to reproduce published research of their choice from a list of proposed papers with publicly available associated code and data. Participants get to work with other people’s material in a low pressure environment and record their experiences on a number of key aspects, including reproducibility, transparency and reusability of materials. At the end of the day we regroup, share our experiences and give feedback to the authors.
It’s imperative to note that ReproHacks are by no means an attempt to criticise or discredit work. We see reproduction as beneficial scientific activity in itself, with useful outcomes for authors and valuable learning experiences for the participants and the research community as a whole.
We strive to make this event open and inclusive to all. As such we ask you to read our Code of Conduct. By participating, you are expected to uphold this code.
Join us at the ReproHack and get working with other people’s material.
Practical experience in reproducibility with real published materials and the opportunity to explore different tools and strategies.
Inspiration from working with other people’s code and data.
An appreciation that reproducibility is non trivial but that opening up your work for more people to engage with is the best way to help improve it. An appreciation that reproducibility has community value beyond just the validation of the results. For example, access to such materials increases the potential for reuse and understanding of the work.
Assessment of how reproducible papers are ‘out of the box’.
Evaluation of how successful current practices are and for what purpose.
Identification of what works and where the most pressing weaknesses in our approaches are.
You’ve put a lot of effort into making your work reproducible. Now let people learn from and engage with it!
We invite nominations for papers that have both associated code and data publicly available. We also encourage analyses based on open source tools as we cannot guarantee participants will have access to specialised licenced software.
Karch, J. (2019, September 16). Improving on Adjusted R-Squared. https://doi.org/10.31234/osf.io/v8dz5
submitted by Julian D. Karch
Why should we attempt to reproduce this paper?
First, I think I came quite far in making it reproducible. The paper uses a simulation study. All the steps from the raw results of the simulation study to the final manuscript can be reproduced by clicking one button. This works because all code + all dependencies are stored online within a virtual machine that anybody can access. So, I think it is a quite good example for others to learn from. Second, the code running the simulation study itself is not included in this process because it would take far too long to run on one machine. I ran the code on a supercomputer. I would be interested in how people would try to reproduce such long-running code and whether they have feedback on how to improve sharing such long-running code.
Paper URL: https://doi.org/10.31234/osf.io/v8dz5
Data URL: https://doi.org/10.24433/CO.8023088.v1
Code URL: https://doi.org/10.24433/CO.8023088.v1
Useful programming skills: R
Spatial modelling of rice yield losses in Tanzania due to bacterial leaf blight and leaf blast in a changing climate. C. Duku, A. H. Sparks, S. J. Zwart. Climatic Change 135.3-4 (2016) pp. 569–583. Springer Nature. doi: 10.1007/s10584-015-1580-2
Why should we attempt to reproduce this paper?
This was my third attempt at making a paper fully reproducible. To date I it’s the most reproducible that I have published. I’m interested to know what stumbling blocks exist that I’m not aware of (aside from needing software like ArcGIS to fully rerun the complete analysis).
Data URL: https://figshare.com/articles/MICORDEA/1408501
Code URL: https://github.com/adamhsparks/MICCORDEA
Useful programming skills: R, Python, ArcGIS
Sparks, A. H., Forbes, G. A, Hijmans, R. J., & Garrett K. A. (2014). Climate change may have limited effect on global risk of potato late blight. Global Change Biology, doi:10.1111/gcb.12587.
Why should we attempt to reproduce this paper?
This is a two-for one. The repository contains code for companion papers, the model development and the model implementation and analysis. As the repository notes, some data are not freely available so I’ve made an effort to allow the paper to be replicated as best possible with what’s available.
Paper URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/gcb.12587
Code URL: https://github.com/adamhsparks/Global-Late-Blight-MetaModelling
Useful programming skills: R
Tennant, J. P., Mannion, P. D., & Upchurch, P. (2016). Sea level regulated tetrapod diversity dynamics through the Jurassic/Cretaceous interval. Nature Communications, 7, 12737.
Why should we attempt to reproduce this paper?
Because it’s a fun paper, involving dinosaurs! But one which I myself have also attempted to reproduce in the past, and struggled with. There are a few additional tweaks that might throw some people off too.
Paper URL: https://www.nature.com/articles/ncomms12737
Data URL: https://www.nature.com/articles/ncomms12737#supplementary-information
Code URL: https://www.nature.com/articles/ncomms12737#supplementary-information
Useful programming skills: R, Perl
Del Ponte EM, Nelson SC, Pethybridge SJ (2019) Evaluation of App-Embedded Disease Scales for Aiding Visual Severity Estimation of Cercospora Leaf Spot of Table Beet. Plant disease 103:1347-1356. 10.1094/PDIS-10-18-1718-RE
submitted by Emerson M. Del Ponte
Why should we attempt to reproduce this paper?
There are data and code written in RMarkdown which allows to reproduce the entire analysis and plots shown of the paper. It also allows to generate HTML document, which is a nice interface that facilitates the reader to understand better why some procedures were adopted and how to run them.
Paper URL: https://apsjournals.apsnet.org/doi/10.1094/PDIS-10-18-1718-RE
Data URL: https://osf.io/ezxps/
Code URL: https://github.com/emdelponte/paper-estimate-app
Useful programming skills: R
Phys. Chem. Chem. Phys., 2019,21, 6133-6141
Why should we attempt to reproduce this paper?
I believe this represents the only example of a reproducible paper from scattering data collected at Diamond Light Source (UK) and the Institute Laue-Langevin (France)
Paper URL: https://doi.org/10.1039/c9cp00203k
Data URL: https://doi.org/10.15125/BATH-00548
Code URL: https://doi.org/10.5281/zenodo.2577796
Useful programming skills: Python, make
K. Hinsen and G.R. Kneller, J. Chem. Phys. 145, 151101 (2016)
Why should we attempt to reproduce this paper?
This is one of the very few papers in biomolecular simulation for which code and data are available and which should be reproducible. But it is also three years old, so it is an interesting test case for the longevity of reproducible research. The infrastructure software is available at http://www.activepapers.org/python-edition/ (with instructions for installation and use)
Paper URL: https://doi.org/10.1063/1.4965881
Data URL: https://doi.org/10.5281/zenodo.162171
Code URL: https://doi.org/10.5281/zenodo.162171
Useful programming skills: Python
Memarzadeh, M., & Boettiger, C. (2019). Resolving the Measurement Uncertainty Paradox in Ecological Management. The American Naturalist, 193(5). https://doi.org/10.1086/702704
Why should we attempt to reproduce this paper?
This will probably be a non-trivial example to reproduce, owing to: (1) long-running code, (2) dependency on external data sources, (3) possibly challenging software dependencies – both trivial ones (e.g. setting up custom fonts and plot themes) and critical ones (requires an external R package wrapping a C++ algorithm, not available on CRAN and can sometimes have interesting compiler issues, like when Apple decided to break the clang compiler in 10.0). Ideally one could just run the R code given in the appendix on your local R session, but that may take a bit of effort. We’ve tried to take steps to address those issues by providing caches of slow-running parts, providing a docker container, and providing sufficient annotations, but who knows!
Paper URL: https://doi.org/10.1086/702704
Data URL: NA
Code URL: https://github.com/boettiger-lab/pomdp-intro
Useful programming skills: R
Prudic KL, Oliver JC, Brown BV, Long EC. 2018. Comparisons of Citizen Science Data-Gathering Approaches to Evaluate Urban Butterfly Diversity. Insects. 9(4):E186. doi: 10.3390/insects9040186
Why should we attempt to reproduce this paper?
This is a fairly digestible paper with statistical analyses and data visualization that rely heavily on open data from citizen science projects.
Paper URL: https://doi.org/10.3390/insects9040186
Data URL: https://doi.org/10.5281/zenodo.1436741
Code URL: https://doi.org/10.5281/zenodo.1436741
Useful programming skills: R
Eglen SJ (2016) Bivariate spatial point patterns in the retina: a reproducible review. Journal de la Société Française de Statistique 157:33–48.
Why should we attempt to reproduce this paper?
Tell me what I should improve!
Paper URL: https://github.com/sje30/eglen2015
Data URL: NA
Code URL: NA
Useful programming skills: R
https://doi.org/10.1186/2047-217X-3-3
Why should we attempt to reproduce this paper?
Tell me what I can improve on; maybe think of other visualisations for data?
Paper URL: https://doi.org/10.1186/2047-217X-3-3
Data URL: NA
Code URL: http://www.damtp.cam.ac.uk/user/eglen/waverepo/
Useful programming skills: R
Kamvar ZN, Amaradasa BS, Jhala R, McCoy S, Steadman JR, Everhart SE. 2017. Population structure and phenotypic variation of Sclerotinia sclerotiorum from dry bean (Phaseolus vulgaris) in the United States. PeerJ 5:e4152 https://doi.org/10.7717/peerj.4152
Why should we attempt to reproduce this paper?
This paper is reproduced weekly in a docker container on continuous integration, but it is also set up to work via local installs as well. It would be interesting to see if it’s reproducible with a human operator who knows nothing of the project or toolchain.
Paper URL: https://peerj.com/articles/4152/
Data URL: https://osf.io/k8wtm
Code URL: https://github.com/everhartlab/sclerotinia-366
Useful programming skills: R, Make and knowledge of Docker containers
Schneider P, Van Gool C, Spreeuwenberg P, Hooiveld M, Donker GA, Barnett DJ, Paget J. Using digital epidemiology methods to monitor influenza-like illness in the Netherlands in real-time: the 2017-2018 season. BioRxiv. 2018 Jan 1:440867.
Why should we attempt to reproduce this paper?
This preprint is an attempt to reproduce Google Flu Trend in the Netherlands.
The whole paper + code is meant to be easily reproducible and transferable to other countries and/or areas. If you are familiar with time series data, lasso regression and cross validation, the analysis should be straight forward.
If anyone is interested, I could also provide influenza data for other European countries.
Paper URL: https://www.biorxiv.org/content/10.1101/440867v1.full
Data URL: https://zenodo.org/record/1459862#.XQbNIG8vNPM
Code URL: https://zenodo.org/record/1459862#.XQbNIG8vNPM
Useful programming skills: R
Bulletin of the National Museum of Nature and Science, Series B (Botany) 45: 77–86
Why should we attempt to reproduce this paper?
It uses the drake R package that should make reproducibility of R projects much easier (just run make.R and you’re done). However, it does depend on very specific package versions, which are provided by the accompanying docker image.
Paper URL: https://www.joelnitta.com/publication/2019-03-27_pleurosoriopsis/
Data URL: https://github.com/joelnitta/pleurosoriopsis
Code URL: https://github.com/joelnitta/pleurosoriopsis
Useful programming skills: R
@misc{open_trade_statistics_2019, title = {OTS BETA DASHBOARD}, url = {https://shiny.tradestatistics.io/}, author = {{Open Trade Statistics}}, publisher = {Open Trade Statistics}, year = {2019}, month = {Apr}, note = {Accessed: June 22, 2019} }
Why should we attempt to reproduce this paper?
The focus of the project is reproducibility. Here we show the differences to access data compared to similar initiatives: https://ropensci.org/blog/2019/05/09/tradestatistics/. Also, similar projects have obscure parts, while our exposes the code from raw data downloading to dashboard creation.
Paper URL: https://shiny.tradestatistics.io
Data URL: https://api.tradestatistics.io
Code URL: https://github.com/tradestatistics
Useful programming skills: R, Shiny
Seibold, H., Zeileis, A. and Hothorn, T., 2019. model4you: An R Package for Personalised Treatment Effect Estimation. Journal of Open Research Software, 7(1), p.17. DOI: http://doi.org/10.5334/jors.219
Why should we attempt to reproduce this paper?
I guess it could be a cool learning experience. The paper is written with knitr, uses a seed, is part of the R package it describes, was openly written using version control (SVN, R-Forge) and is available in an open access journal (@up_jors).
Paper URL: http://doi.org/10.5334/jors.219
Data URL: NA
Code URL: https://r-forge.r-project.org/scm/viewvc.php/pkg/model4you/inst/JORS/?root=partykit
Useful programming skills: R, knitr (LaTeX), version control (SVN)
Holzleitner et al. (In press). Comparing theory-driven and data-driven attractiveness models using images of real women’s faces, JEP:HPP.
submitted by Ben Jones
Why should we attempt to reproduce this paper?
Complex analyses over multiple variables. In press, so we can still fix errors ahead of publication!!
Paper URL: https://psyarxiv.com/vhc5k
Data URL: https://osf.io/jurcq/
Code URL: https://osf.io/jurcq/
Useful programming skills: R
10:00 Coffee and Tea ☕
10:30 Welcome
10:40 Presentation about tools for Reproducible Research by Dr. Anna Krystalli
11:30 Forming groups and start hackathon 👨🏻💻 👨🏼💻
12:30 Lunch buffet 🥗 🥙
14:30 2nd presentation (tbc)
15:00 Continue hacking 👨🏽💻👩🏼💻
16:30 Drinks and bites 🍻